Modeling Informatively Missing Genotypes in Haplotype Analysis.
نویسندگان
چکیده
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random-that is, at a given marker, different genotypes and different alleles are missing with the same probability. In our previous work, we have demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We have proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We have proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.
منابع مشابه
Haplotype analysis in the presence of informatively missing genotype data.
It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These inclu...
متن کاملGENECOUNTING: haplotype analysis with missing genotypes
A general algorithm is described for haplotype analysis of unrelated individuals with missing genotypes. It can handle problems involving multiple polymorphic markers with missing data.
متن کاملThe impact of complex informative missingness on the validity of the transmission/disequilibrium test (TDT)
The transmission/disequilibrium test was introduced to test for linkage and association between a marker and a putative disease locus using case-parent triads. Several extensions have been proposed to accommodate incomplete triads. Some strategies assumed that parental genotypes were missing completely at random and some methods allowed informative missingness for parental genotypes. However, t...
متن کاملPosters IMPUTATION OF MISSING GENOTYPES IN HIGH DENSITY SNP DATA
The accuracy and computational complexity of five methods to impute missing genotypes in high density SNP data was investigated. The haplotype reconstruction package fastPHASE reached the highest accuracies (91% to 98%) for varying proportions (0.2% to 8%) of missing genotypes. Alternative methods based on principal component analysis were less accurate (67% to 94%), but their computational dem...
متن کاملThe Incomplete Perfect Phylogeny Haplotype Problem
The problem of resolving genotypes into haplotypes, under the perfect phylogeny model, has been under intensive study recently. All studies so far handled missing data entries in a heuristic manner. We prove that the perfect phylogeny haplotype problem is NP-complete when some of the data entries are missing, even when the phylogeny is rooted. We define a biologically motivated probabilistic mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Communications in statistics: theory and methods
دوره 38 18 شماره
صفحات -
تاریخ انتشار 2009